309 research outputs found

    VEWS: A Wikipedia Vandal Early Warning System

    Full text link
    We study the problem of detecting vandals on Wikipedia before any human or known vandalism detection system reports flagging potential vandals so that such users can be presented early to Wikipedia administrators. We leverage multiple classical ML approaches, but develop 3 novel sets of features. Our Wikipedia Vandal Behavior (WVB) approach uses a novel set of user editing patterns as features to classify some users as vandals. Our Wikipedia Transition Probability Matrix (WTPM) approach uses a set of features derived from a transition probability matrix and then reduces it via a neural net auto-encoder to classify some users as vandals. The VEWS approach merges the previous two approaches. Without using any information (e.g. reverts) provided by other users, these algorithms each have over 85% classification accuracy. Moreover, when temporal recency is considered, accuracy goes to almost 90%. We carry out detailed experiments on a new data set we have created consisting of about 33K Wikipedia users (including both a black list and a white list of editors) and containing 770K edits. We describe specific behaviors that distinguish between vandals and non-vandals. We show that VEWS beats ClueBot NG and STiki, the best known algorithms today for vandalism detection. Moreover, VEWS detects far more vandals than ClueBot NG and on average, detects them 2.39 edits before ClueBot NG when both detect the vandal. However, we show that the combination of VEWS and ClueBot NG can give a fully automated vandal early warning system with even higher accuracy.Comment: To appear in Proceedings of the 21st ACM SIGKDD Conference of Knowledge Discovery and Data Mining (KDD 2015

    Hybrid knowledge systems

    Get PDF
    An architecture called hybrid knowledge system (HKS) is described that can be used to interoperate between a specification of the control laws describing a physical system, a collection of databases, knowledge bases and/or other data structures reflecting information about the world in which the physical system controlled resides, observations (e.g. sensor information) from the external world, and actions that must be taken in response to external observations

    Mitigating Adversarial Gray-Box Attacks Against Phishing Detectors

    Full text link
    Although machine learning based algorithms have been extensively used for detecting phishing websites, there has been relatively little work on how adversaries may attack such "phishing detectors" (PDs for short). In this paper, we propose a set of Gray-Box attacks on PDs that an adversary may use which vary depending on the knowledge that he has about the PD. We show that these attacks severely degrade the effectiveness of several existing PDs. We then propose the concept of operation chains that iteratively map an original set of features to a new set of features and develop the "Protective Operation Chain" (POC for short) algorithm. POC leverages the combination of random feature selection and feature mappings in order to increase the attacker's uncertainty about the target PD. Using 3 existing publicly available datasets plus a fourth that we have created and will release upon the publication of this paper, we show that POC is more robust to these attacks than past competing work, while preserving predictive performance when no adversarial attacks are present. Moreover, POC is robust to attacks on 13 different classifiers, not just one. These results are shown to be statistically significant at the p < 0.001 level

    Strong Completeness Results for Paraconsistent Logic Programming

    Get PDF
    In [6], we introduced a means of allowing logic programs to contain negations in both the head and the body of a clause. Such programs were called generally Horn programs (GHPs, for short). The model-theoretic semantics of GHPs were defined in terms of four-valued Belnap lattices [5]. For a class of programs called well-behaved programs, an SLD-resolution like proof procedure was introduced. This procedure was proven (under certain restrictions) to be sound (for existential queries) and complete (for ground queries). In this paper, we remove the restriction that programs be well-behaved and extend our soundness and completeness results to apply to arbitrary existential queries and to arbitrary GHPs. This is the strongest possible completeness result for GHPs. The results reported here apply to the design of very large knowledge bases and in processing queries to knowledge bases that possibly contain erroneous information

    Abduction in Annotated Probabilistic Temporal Logic

    Get PDF
    Annotated Probabilistic Temporal (APT) logic programs are a form of logic programs that allow users to state (or systems to automatically learn)rules of the form ``formula G becomes true K time units after formula F became true with L to U% probability.\u27\u27 In this paper, we develop a theory of abduction for APT logic programs. Specifically, given an APT logic program Pi, a set of formulas H that can be ``added\u27\u27 to Pi, and a goal G, is there a subset S of H such that Pi cup S is consistent and entails the goal G? In this paper, we study the complexity of the Basic APT Abduction Problem (BAAP). We then leverage a geometric characterization of BAAP to suggest a set of pruning strategies when solving BAAP and use these intuitions to develop a sound and complete algorithm

    Deception Detection in Group Video Conversations using Dynamic Interaction Networks

    Full text link
    Detecting groups of people who are jointly deceptive in video conversations is crucial in settings such as meetings, sales pitches, and negotiations. Past work on deception in videos focuses on detecting a single deceiver and uses facial or visual features only. In this paper, we propose the concept of Face-to-Face Dynamic Interaction Networks (FFDINs) to model the interpersonal interactions within a group of people. The use of FFDINs enables us to leverage network relations in detecting group deception in video conversations for the first time. We use a dataset of 185 videos from a deception-based game called Resistance. We first characterize the behavior of individual, pairs, and groups of deceptive participants and compare them to non-deceptive participants. Our analysis reveals that pairs of deceivers tend to avoid mutual interaction and focus their attention on non-deceivers. In contrast, non-deceivers interact with everyone equally. We propose Negative Dynamic Interaction Networks to capture the notion of missing interactions. We create the DeceptionRank algorithm to detect deceivers from NDINs extracted from videos that are just one minute long. We show that our method outperforms recent state-of-the-art computer vision, graph embedding, and ensemble methods by at least 20.9% AUROC in identifying deception from videos.Comment: The paper is published at ICWSM 2021. Dataset link: https://snap.stanford.edu/data/comm-f2f-Resistance.htm
    • …
    corecore